Dr James Bartlett
Approaches to statistical inference and wanting to test the null
Target data set: A Comparison of Students’ Statistical Reasoning After Being Taught With R Programming Versus Hand Calculations
How do our inferences change depending on the approach?
Equivalence testing
Bayes factors
Bayesian ROPE
Is the point-null plausible? Would rejecting the null be surprising? Do you want to make decisions about how to act with a given error rate? (Lakens, 2021)
Meehl’s paradox: With increasing sample size, its easier to confirm a hypothesis via rejecting a point-null (Kruschke & Liddell, 2018)
Crud factor: In non-randomised studies, we might expect non-null effects (Orben & Lakens, 2020), but would they be meaningful?
Frequentist
Bayesian
Technology or Tradition? A Comparison of Students’ Statistical Reasoning After Being Taught With R Programming Versus Hand Calculations (Ditta & Woodward, 2022)
Compared conceptual understanding of statistics at the end of a 10-week intro course
Students completed one of two versions:
Formula-based approach to statistical tests (n = 57)
R code approach to statistical tests (n = 60)
Research question: Does learning through hand calculations or R code lead to greater conceptual understanding of statistics?
Between-subjects IV: Formula-based or R code approach course
DV: Final exam (conceptual understanding questions) score as proportion correct (%)
Welch Two Sample t-test
data: e3total by condition
t = -1.117, df = 110.97, p-value = 0.2664
alternative hypothesis: true difference in means between group HC and group R is not equal to 0
95 percent confidence interval:
-7.584355 2.116173
sample estimates:
mean in group HC mean in group R
69.29091 72.02500
Keeping frequentist
Going Bayesian
Bayes factors (the authors report these)
Bayesian Region of Practical Equivalence (ROPE)
Figure from Lakens (2017)
Key decisions to make
What alpha value to use?
What values to use for the smallest effect size of interest boundaries?
Welch Modified Two-Sample t-Test
Hypothesis Tested: Equivalence
Equivalence Bounds (raw):-10.000 & 10.000
Alpha Level:0.05
The equivalence test was significant, t(110.97) = 2.968, p = 1.83e-03
The null hypothesis test was non-significant, t(110.97) = -1.117, p = 2.66e-01
NHST: don't reject null significance hypothesis that the effect is equal to zero
TOST: reject null equivalence hypothesis
TOST Results
t SE df p.value
t-test -1.117011 2.447684 110.97 2.664022e-01
TOST Lower 2.968483 2.447684 110.97 1.833635e-03
TOST Upper -5.202506 2.447684 110.97 4.542552e-07
Effect Sizes
estimate SE lower.ci upper.ci conf.level
Raw -2.7340909 2.447684 -6.7940673 1.3258855 0.9
Hedges' g(av) -0.2073357 0.188892 -0.5208061 0.1001411 0.9
Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
t-test: Not significantly different to 0
Equivalence test: Statistically equivalent using bounds of ±10%, but not ±5%
Bayes factor: TBD
Bayesian ROPE: TBD
Relative predictive performance of two competing hypotheses (Van Doorn et al., 2021)
How much should we shift our prior belief between two competing hypotheses after observing data?
Typically comparing a null model vs an alternative model
BF10 = 4.57 would mean data 4.57 times more likely under alternative model than the null model
BF01 = 2.34 would mean data 2.34 times more likely under null model than the alternative model
Key decisions to make
What is your prior for the alternative hypothesis?
What level of evidence would be convincing?
Bayes factor analysis
--------------
[1] Null, mu1-mu2=0 : 2.872235 ±0.02%
Against denominator:
Alternative, r = 0.707106781186548, mu =/= 0
---
Bayes factor type: BFindepSample, JZS
Rough strength of evidence guidelines (Van Doorn et al., 2021)
BF > 1 = Weak evidence
BF > 3 = Moderate evidence
BF > 10 = Strong evidence
We get somewhat consistent conclusions of weak to moderate evidence in favour of the null:
| Prior | Bayes factor |
|---|---|
| Medium | 2.87 |
| Wide | 3.84 |
| Ultrawide | 5.25 |
t-test: Not significantly different to 0
Equivalence test: Statistically equivalent using bounds of ±10%, but not ±5
Bayes factor: Weak to moderate evidence in favour of the null hypothesis compared to the alternative
Bayesian ROPE: TBD
Applies Bayesian inference to regression models (e.g., Heino et al., 2018)
Define a descriptive model of parameters
Specify prior probability distributions for model parameters
Update prior to posterior distributions using Bayesian inference
Interpret model and parameter posterior distributions
Compares parameter posterior distributions to a rejection region
Similar to equivalence testing, creates three decisions: 1) HDI outside ROPE, 2) HDI within ROPE, 3) HDI and ROPE partially overlap
Figure from Masharipov et al. (2021)
brms (Bürkner, 2017) provides flexible Bayesian modelling
bayestestR (Makowski et al., 2019) for helpful summary and plotting functions
Key decisions to make
Prior for each parameter
Boundaries for ROPE
We can get a summary of our intercept and coefficient from the Bayesian regression model
The 95% HDI for the coefficient (mean difference) is entirely within ROPE bounds of ±10%:
| Parameter | Median | Lower 95% CI | Higher 95% CI | ROPE % | |
|---|---|---|---|---|---|
| 2 | Intercept | 69.46 | 65.97 | 72.91 | 0 |
| 1 | Condition | 2.68 | -2.16 | 7.32 | 1 |
t-test: Not significantly different to 0
Equivalence test: Statistically equivalent using bounds of ±10%, but not ±5%
Bayes factor: Weak to moderate evidence in favour of the null hypothesis compared to the alternative
Bayesian ROPE: We can accept the ROPE of ±10% around the coefficient posterior, but not ±5%.
RQ: There was no meaningful difference between a formula based and r code based course on conceptual understanding of statistics
Across frequentist and Bayesian approaches, we get pretty similar conclusions, but decisions in data analysis did affect the conclusions:
What boundaries do you use for the smallest effect size of interest?
What prior would you use for the alternative hypothesis when calculating Bayes factors?
New (work in progress) PsyTeachR book where chapters 9 and 10 cover Bayes factors / modelling
Comparing equivalence testing and Bayes factors (Lakens et al., 2020)
Introduction to Bayes and ROPE (Kruschke & Liddell, 2018)
Comparing frequentist vs Bayesian modelling (Flores et al., 2022)
Thank you for listening!
Any questions?
What is your preferred approach to statistical inference?
What approaches have you used to argue there was no meaningful effect?
# Don't run again, but this time specify our informed priors
Ditta_model <- bf(e3total ~ condition)
Ditta_priors <- c(prior(normal(50, 16), class = Intercept),
prior(normal(0, 3), class = b))
# Default flat priors
Ditta_fit2 <- brm(
prior = Ditta_priors, # Specify informed priors
formula = Ditta_model, # formula we defined above
data = Ditta_data, # Data frame we're using
family = gaussian(),
seed = 1908,
file = "Data/Ditta_model2" #Save the model as a .rds file
)| Model | Parameter | Median | Lower 95% CI | Higher 95% CI | ROPE % |
|---|---|---|---|---|---|
| Default priors | Intercept | 69.46 | 65.97 | 72.91 | 0 |
| User priors | Intercept | 69.74 | 66.68 | 72.84 | 0 |
| Default priors | Condition | 2.68 | -2.16 | 7.32 | 1 |
| User priors | Condition | 1.66 | -2.14 | 5.36 | 1 |